Evaluation of the Sketch Engine Thesaurus on Analogy Queries

نویسنده

  • Pavel Rychlý
چکیده

Recent research on vector representation of words in texts bring new methods of evaluating distributional thesauri. One of such methods is the task of analogy queries. We evaluated the Sketch Engine thesaurus on a subset of analogy queries using several similarity options. We show that Jaccard similarity is better than the cosine one for bigger corpora, it even substantially outperforms the word2vec system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discovering Distributional Thesauri Semantic Relations

The paper presents technique and analysis to discover distributional thesauri relations by using statistical similarity of different word’s contexts. The application uses educational electronic text corpus and the Sketch Engine software statistical search to extract and compare word’s collocations from the related text corpus. The semantic search used is based on the evaluation and comparison o...

متن کامل

Building A Thesaurus Using LDA-Frames

In this paper we present a new method for measuring semantic relatedness of lexical units, which can be used to generate a thesaurus automatically. The method is based on a comparison of probability distributions of semantic frames generated using the LDA-frames algorithm. The idea is evaluated by measuring the overlap of WordNet synsets and generated semantic clusters. The results show that th...

متن کامل

The Sketch Engine

Word sketches are one-page automatic, corpus-based summaries of a word’s grammatical and collocational behaviour. They were first used in the production of the Macmillan English Dictionary and were presented at Euralex 2002. At that point, they only existed for English. Now, we have developed the Sketch Engine, a corpus tool which takes as input a corpus of any language and a corresponding gram...

متن کامل

Sketching the Dependency Relations of Words in Chinese

We proposes a language resource by automatically sketching grammatical relations of words based on dependency parses from untagged texts. The advantage of word sketch based on parsed corpora is, compared to Sketch Engine (Kilgarriff, Rychly, Smrz, & Tugwell, 2004), to provide more details about the different usage of each word such as various types of modification, which is also important in la...

متن کامل

An efficient algorithm for building a distributional thesaurus (and other Sketch Engine developments)

Gorman and Curran (2006) argue that thesaurus generation for billion+-word corpora is problematic as the full computation takes many days. We present an algorithm with which the computation takes under two hours. We have created, and made publicly available, thesauruses based on large corpora for (at time of writing) seven major world languages. The development is implemented in the Sketch Engi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016